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The  problem  of  proving  programs  correct  has  existed  since  the  first 
computers  were  constructed,  and  has  steadily  grown  more  pressing  as 
programs  have  increased  in  complexity.   The  large  number  of  papers  (close 
to  800)  that  have  been  written  in  this  area  have  established  program 
verification  as  a  recognised  subfield  of  computer  science.   But  in  spite 
of  an  this  work,  a  truly  practical  program  verification  technology  has 
been  slow  to  develop,  and  is  still  not  at  hand.   Indeed,  the  history  to 
date  of  program  verification  makes  one  key  fact  quite  clear:   The  problem 
of  verifying  large  programs  rigorously  is  a  very  difficult  one,  and  its 
solution  will  imply  significant  changes  in  current  programming  technology. 
This  paper  will  review  various  salient  aspects  of  the  verification  problem, 
assess  progress  to  date,  point  cut  major  difficulties  which  remain,  and 
attempt  to  project  future  directions  of  progress. 

2.   How  is  it  possible  to  prove  programs  correct? 

To  prove  programs  correct,  one  can  begin  by  giving  formal  mathematical 
definition  to  the  notions  'program'  and  'program  execution'.   Essentially, 
these  mathematical  definitions  merely  formalise  the  basic  facts  concerning 
programs  and  their  meaning  that  one  would  present  in  a  first  course  on 
programming.  A  'program'  can  be  defined  formally  as  a  sequence  of  'statements', 
where  each  statement  is  a  syntactic  structure  of  one  of  the  three  forms 

variable  :=  expression; 
or       if  expression  then  sub  statement  else  sub stat ement  ; 
or       while  expression  do  sub statement; 

(Of  course,  depending  on  the  richness  of  the  programming  language  L  with 
which  one  is  working,  a  few  other  forms  of  statement  may  also  have  to  be 
considered.)   The  'state'  of  a  program  is  a  mapping  which  defines  the  current 
value  of  every  variable  appearing  in  the  program,  and  which  also  defines  a 
'current  control  location',  i.e.,  the  particular  statement  of  the  program 
which  is  to  be  executed  next.   Execution  of  a  statement  changes  this  state 
according  to  familiar,  well  defined  rales  (these  are  just  the  rules  that 
have  to  be  explained  in  an  elementary  programming  course  to  students  meeting 
the  programming  language  L  for  the  first  time;  we  can  call  them  the 


'transition  rules'  of  L  ).   The  'state  sequence'  corresponding  to  a 
program  P  is  the  sequence  of  states  whose  first  state  is  defined  by  the 
initialisation  rules  of  the  language  L  ,  and  in  which  each  subsequent 
state  is  obtained  from  the  state  which  preceeds  it  by  application  of  the 
appropriate  transition  rule  of  L  .   Thus  the  state  sequence  of  the 
program  P  is  uniquely  defined  by  the  text  of  P  .   To  prove  P  correct 
is  to  prove  some  desired  property  of  this  state  sequence  rigorously. 

Although  direct,  elementary,  and  mathematically  adequate,  the  approach 
to  program  correctness  that  has  just  been  outlined  is  unsatisfactory  in 
practical  terms.   To  see  why  this  is  so,  we  can  begin  by  remarking  that 
a  satisfactory  verification  technology  must  aim  at  more  than  the  informal 
or  semi-formal  type  of  reasoning  customary  in  ordinary  published  mathematics. 
Indeed,  reasoning  of  this  type  does  not  prevent  the  intrusion  of  any  number 
of  small  errors,  many  of  them  of  exactly  the  sort  that  cause  program  bugs. 
We  cannot  remain  content  to  'prove'  the  correctness  of  a  program  containing 
minor  errors  by  a  'proof  which  also  contains  minor  errors  1   Rather, 
verification  technology  must  allow  proofs  to  be  expressed  in  computer 
readable  form,  and  to  be  certified  forms"1"1?/  by  a  programmed  pro^f- checker 
or  theorem-prover.   Given  this  requirement,  the  burden  of  detail  inherent 
in  any  attempt  to  consider  state  sequences  directly  is  readily  seen  to  be 
overwhelming,  and  one  comes  rapidly  to  the  conclusion  that  a  considerably 
closer  link  between  programs  and  logical  formulae  acceptable  to  a  computerised 
proof  verifier  is  necessary. 

3.   Formalisms  expediting  proof  of  program  correctness. 

To  build  an  adequate  link  between  program  texts  on  the  one  hand  and 
logical  formulae  on  the  other,  several  paths  are  possible.   One  approach 
is  to  convert  programs  P  into  formulae  F  of  logic  in  such  a  way  that 
proof  of  F  will  guarantee  correctness  of  P  ;  this  is  the  method  introduced 
by  Floyd  in  1967.   Another  approach  is  to  extend  the  rules  of  logic  so  that 
they  apply  directly  to  program-like  objects.   This  leads  to  the  definition 
of  a  'logic  of  programs'  within  which  a  formal  notion  of  validity  for 
program-like  objects  exists  and  in  which  these  objects  play  very  much  the 
same  role  that  formulae  of  predicate  calculus  play  in  standard  symbolic 
logic. 


In  the  first,  Floyd  approach,  one  is  given  a  program  text  P  ,  to 
•which  an  input  assumption  and  an  output  assertion  are  attached.   (These 
assertions  can  be  written  in  any  convenient,  sufficiently  powerful, 
logical  formalism,  e.g.  predicate  calculus  supplemented  by  set  theory.) 
Call  a  program  text  thus  annotated  (i.e.,  carrying  attached  assumptions 
and  assertions)  an  annotated-program.   To  prove  an  annotated-program 
correct  by  the  Floyd  method,  one  begins  by  attaching  additional  assertions 
to  its  text.   Generally  speaking,  at  least  one  additional  assertion  needs 
to  be  attached  to  each  loop  in  the  program  text  -which  we  are  trying  to 
prove  correct;  in  effect,  this  assumption  captures  and  formalises  the  idea 
which  led  the  programmer  to  write  the  loop  in  the  first  place.   The 
annotated-program  P'   developed  by  adding  these  additional  assertions 
to  P  is  then  processed  by  a  verification  condition  generator,  which  by 
a  straightforward  process  converts  P1   into  a  set  of  logical  formulae  F  . 
Then  to  complete  the  verification  one  must  get  a  proof  checker  to  accept 
all  these  formulae. 

Overall,  then,  verification  of  a  program  P  by  the  Floyd  method  involves 
three  major  steps.   We  must 

(a)  Devise  the  right  set  of  predicate  assertions  and  attach  them  to 
appropriately  chosen  points  in  P  ,  being  sure  to  attach  an  assertion 
to  at  least  one  point  in  each  loop  in  P  . 

(b)  Pass  P  ,  together  with  this  set  of  assertions,  to  a  verification 
condition  generator,  which  will  respond  by  passing  back  a  set  of 
logical  formulae  F  . 

(c)  Verify  all  these  logical  formulae. 

To  accomplish  step  (a),  the  author  of  P  needs  to  take  a  close  look 
at  P  ,  and  needs  to  make  explicit  s2J    the  implicit  assertions  which  shaped 
P  during  its  composition.   This  may  be  tedious,  but  it  is  an  act  of 
completion  rather  than  invention,  so  that  the  effort  required  to  accomplish 
it  will  generally  not  be  inordinate.   Step  (b)  involves  only  symbolic 
manipulation  of  a  conventional  sort,  and  is  easily  programmed.   The  crucial 
difficulty  lies  in  step  (c) .      To  accomplish  this  step,  proofs  must  be 
supplied  and  formally  verified.   These  proofs  need  not  be  profound,  (and 
even  if  they  are  will  have  been  worked  out  informally  in  all  essential 


regards  long  before  the  programmer  encountered  them),  but  they  can  be  long 
and  tedious  anyhow.   Proof -checker  technology  is  still  only  weakly  developed; 
proof  checkers  can  take  only  very  small  steps  themselves,  and  must  the^fore 
often  be  guided  in  great  detail.   Thus  we  identify  the  essential  difficulty 
of  program  verification  as  an  economic  one:   the  cost  of  fully  formal 
program,  verification  using  existing  techniques  is  still  too  high.   The 
central  aim  of  work  in  program  verification  technology  must  therefore  be 
to  reduce  these  costs. 

k.        How  can  the  cost  of  program  verification  be  reduced? 

In  attempting  to  reduce  the  cost  of  full  formal  program  verification, 
which  is  at  present  prohibitive  (i.e.,  mere  than  ten  times  the  cost  of 
program  development  and  informal  debugging  by  conventional  methods),  a 
thicket  of  interrelated  difficulties  must  be  faced. 

(i)  Level  of  language.   Depending  on  the  weight  assigned  to  efficiency 
of  product  as  against  programming  cost,  a  program  can  be  written  in  any  one 
of  a  number  of  styles.   If  very  high  efficiency  and  minimal  code  size  are 
strictly  required,  then  there  may  be  no  choice  but  to  produce  a  carefully 
tailored  assembly  language  version  of  a  code.   If  high  efficiency  is  not 
a  crucial  issue  but  overwhelming  programming  costs  are  feared,  then  it  may 
be  much  better  to  make  use  of  a  programming  language  of  the  highest 
available  level.   If  verification  is  at  issue,  then  the  relative  advantage 
of  very  high  level  languages  becomes  greater  still,  since  if  a  program 
verification  system  must  cope  with  masses  of  low  level  programming  detail 
the  costs  of  verification  are  bound  to  become  overwhelming.   Thus,  for 
verification  technology  to  become  practical,  use  of  languages  of  substantially 
higher  level  than  is  now  typical  will  have  to  become  a  much  more  common  and 
broadly  understood  technique. 

(ii)  Transformational  approach  to  program  development.   Complex  algorithms 
can  most  easily  be  understood  and  proved  correct  when  written  in  a  high  level 
abstract  form.   Initially,  such  an  abstract  algorithm  version  may  even 
contain  non-executable  expressions.   An  initial  algorithm  text  of  this  kind 
can  then  be  moved  toward  implement ability,  and  ultimately  toward  high 
efficiency,  by  various  reorganisations,  including  rearrangement  of  control 


patterns,  choice  of  data  structures,  replacement  of  abstract  inefficient 
instruction  sequences  by  concrete  much  more  efficient  sequences,  etc. 
All  of  these  transformations  must  be  such  as  to  preserve  correctness. 
That  is,  a  practical  program  verification  technology  will  have  to  formalise, 
but  in  a  manner  guaranteed  to  preserve  correctness,  the  whole  sequence  of 
generating  steps  by  which  programmers  construct  programs  in  the  first  place. 

(iii)  Re-use  of  verified  program  fragments,  modifiability  of  large 
correct  programs. 

In  view  of  the  very  high  present  costs  of  verification,  re-usability 
of  verified  fragments  is  a  central  issue.   For  this  to  be  possible,  one 
must  seek  dictions  which  allow  programs  to  be  'factored'  into  combinable 
subparts  which  represent  standardised,  formally  recognisable  fragments  of 
programming  technique.   If  a  system  permitting  this  were  developed,  it 
would  be  possible  for  application  programmers  to  combine  these  'root  programs' 
into  the  larger  programs  that  they  required.   Other  more  specialised 
'algorithm  developers'  could  be  responsible  for  entering  these  root  programs, 
with  their  proofs,  into  a  growing  library  of  verified  program  fragments  and 
transformation  rules.   Thus  programming  would  evolve  into  an  activity 
resembling  formal  algebraic  manipulation  within  a  symbolic  mani"Pulation 
system  such  as  MACSYMA. 

Tne  requirements  of  verification  technology  are  also  likely  to  constrain 
the  way  in  which  programs  are  modified  during  their  useful  lifetime.   It  must 
be  noted  that  programs  are  not  ordinarily  static  objects  which  remain 
entirely  unchanged  after  their  initial  development.   Rather,  during  their 
useful  life  they  undergo  continual  large  and  small  changes,  as  new  features 
are  added  and  the  implementation  of  old  features  is  polished.  At  present 
this  important  process  of  continuing  modification  is  informal,  and  indeed 
casual.   A  solid  program  verification  technology  can  hardly  tolerate  this, 
since  a  single  uncontrolled  program  change  can  invalidate  the  whole  of  an 
expensive  and  painfully  constructed  correctness  proof.  We  therefore  expect 
program  modification  to  be  managed  in  roughly  the  following  way  when  formal 
program  verification  comes  into  general  use:  Verified  program  texts  will 
be  held,  along  with  assertions  that  have  already  been  proved  correct,  in 
special,  secure  program  libraries.   To  modify  a  program  in  such  a  library, 


one  will  be  required  to  use  a  specialised  'verifying  editor'.   This 
editor  will  reject  any  modification  for  which  appropriate  assertions  have 
not  been  supplied,  will  determine  the  set  of  assertions  affected  by  any 
proposed  modification,  and,  before  accepting  the  modification,  will  demand 
proof  of  all  implications  connecting  old  assertions  either  with  a  new 
assertion  or  with  any  path  through  modified  text.   This  restriction  ensures 
that  the  programs  of  a  secure  library  will  remain  correct  even  when  modified. 
Moreover,  the  material  needed  to  justify  a  local  program  modification  will 
generally  be  only  a  very  small  fraction  of  the  material  originally  needed 
to  justify  all  the  details  of  large  programs. 

(iv)  Development  of  proof -checker  technology. 

In  the  preceding  pages,  we  have  argued  that  the  difficulties  of  program 
verification  will  be  so  severe  as  to  mandate  substantial  revisions  of 
current  programming  practice,  forcing  a  move  to  languages  of  very  high 
level  and  requiring  the  construction  of  substantial  program  management 
systems  which  constrain  the  manner  in  which  large  application  programs 
can  be  changed.   Since  the  anticipated  technical  difficulties  of  verification 
are  so  much  greater  than  those  of  compilation,  one  of  the  first  things  that 
it  makes  sensse  to  do  is  to  adapt  the  languages  and  compilers  with  which 
one  will  work  so  as  to  alleviate  the  verification  problems  that  must 
ultimately  be  faced.   Of  course,  even  after  this  adaptation  is  made,  the 
essential  residue  that  will  remain  is  a  mass  of  theorems  to  be  verified 
formally  by  mathematical  techniques.   Hence  improved  proof  verification 
technology  is  a  necessary  part  of  any  mature  technology  of  program  proof. 

A  proof  checker  is  an  interactive  programmed  system  into  which  one 
can  enter  sequences  of  logical/mathematical  formulae,  each  of  which  is  a 
consequence,  according  to  the  laws  of  logic,  of  preceding  formulae.   Such 
a  system  will  continue  to  accept  new  formulae  as  long  as  it  can  verify  that 
each  formula  is  indeed  a  correct  consequence  of  what  has  gone  before.   Any 
step  which  is  too  complex  for  the  verifier  to  follow  (or  which  is  incorrect) 
will  be  rejected,  forcing  the  verifier's  user  to  enter  a  number  of  intermediate 
formulae  in  order  to  get  the  verifier  to  accept  the  formula  which  really 
interests  him.   Thus  the  verifier  ensures  rigorously  against  logical  error, 
possibly  at  the  price  of  requiring  its  user  to  key  in  a  burdensome  mass  of 


intermediate  detail.   The  designer  of  such  a  proof  verifier  must  aim  to 

provide  it  with  enough  internal  power  for  the  mass  of  detail  which 

is  demanded  to  he  reduced  to  reasonable  levels.   As  noted  above, 

we  expect  most  of  the  cost  of  program  verification  to  lie  in  this  mass  of 

detail.   Thus  we  expect  improvements  in  the  power  of  proof  verifiers  to 

translate  directly  into  reductions  in  the  cost  of  program  proof. 

Full-scale  proof  verifiers  can  be  expected  to  consist  of  three 
principal  components : 

(a)  An  inner  core  of  procedures  which  handle  what  the  system  regards 
and  the  user  perceives  as  elementary  inferential  steps. 

(b)  An  outer  layer  of  administrative  routines  which  mediate  between 
the  system  user  and  the  inferential  core  of  the  verification  system.   These 
routines  maintain  a  growing  library  of  already  proved  theorems,  keep  track 
of  proofs  in  progress,  and  define  the  temporary  set  of  hypotheses  under 
which  proof  is  currently  proceeding.   These  routines  also  display  logical 
formulae  in  form  appropriate  for  user  inspection,  and  interpret  and 
supplement  user  keyboard  input,  cc-iverting  this  input  into  a  sequence  of 
directives  to  the  primitive  inferential  procedures  of  the  verification 
system's  inner  core. 

(c)  A  family  of  extension  mechanisms,  which  allow  a  system  user  to 
build  up  a  personalised  family  of  auxiliary  routines  that  can  transform 
and  supplement  his  keyboard  input  in  specialised  ways,  and  which  also 
allow  new  inference  procedures  to  be  added  to  the  system  core.   Of  course, 
before  any  addition  of  this  second  kind  will  be  accepted,  the  system  will 
insist  that  the  addition  be  justified  rigorously  by  formal  proof  of  an 
appropriate  metatheorem. 

The  most  challenging  and  critical  part  of  a  proof  verifier  having 
this  general  structure  is  its  inferential  core.   If  this  core  is  powerful 
enough,  the  system  user  will  be  able  to  make  comfortably  large  and 
intuitive  formal  steps;  if  not,  then  large  and  counterintuitive  masses 
of  detail  may  be  required  to  prove  even  rather  simple  statements.   The 
size  of  the  elementary  inferential  steps  which  a  proof  verifier  permits 
directly  controls  the  cost  of  program  proofs  in  a  system  based  upon  the 
verifier. 


The  inferential  core  of  a  proof  verification  system  is  precisely  an 
automatic  theorem  prover.   For  this  reason,  we  shall  now  review  the  present 
status  of  techniques  for  automatic  proof  of  mathematical  theorems,  an  area 
which  has  been  worked  on  intensively  for  over  a  decade,  and  which  has 
accumulated  a  literature  of  over  three  hundred  papers. 

It  is  useful  to  divide  the  automatic  proof  methods  that  have  been 
developed  into  two  main  classes,  which  we  shall  call  non-instantiating 
methods  and  instantiating  methods  respectively.   Non- instantiating  automatic 
proof  methods  are  characterised  by  the  fact  that  they  work  exclusively  with 
mathematical  objects  which  are  either  implicit  in  the  statement  S  of  the 
theorem  given  to  be  proved  or  which  can  be  calculated  from  such  objects 
by  some  deterministic  algorithm  known  in  advance.   Instantiating  proof 
methods  are  considerably  more  general,  in  that  they  are  willing  to  search 
for  objects  not  directly  implicit  in  the  statement  S  to  be  proved  but 
necessary  for  the  proof  of  S  .  To  make  this  distinction  clearer,  consider 
a  hypothetical  automatic  geometry  theorem  prover  P  .   If  P  is 
non- instantiating,  it  can  work  by  propagating  sequences  of  congruences, 
similarities,  etc.,  between  lines,  angles,  triangles,  etc.,  present  in 
the  figure  F  presented  to  it;  bux  it  will  never  attempt  to  construct 
any  line  or  consider  any  point  not  already  present  in  F  .   An 
instantiating  geometry  theorem  prover  would  be  willing  to  try  such 
constructions,  and  would  therefore  face  an  enormously  more  intimidating 
range  of  possibilities.   Generally  speaking,  non-instantiating  proof 
algorithms  handle  limited  classes  of  statements,  but  can  do  so  efficiently; 
instantiating  provers  are  logically  general,  but  for  this  very  reason 
can  never  be  efficient.  An  ideal  automatic  proof  package  will  combine 
both  techniques,  possibly  using  a  general  instantiating  method  to  deduce 
sets  of  statements  of  special  form  whose  consistency  can  then  be  tested 
rapidly  using  more  specialised  and  efficient  non- instantiating  techniques. 

As  an  example  illustrating  the  instantiating/non- instantiating 
distinction,  consider  the  elementary  set-theoretic  boolean  implication 

(1)  (AcB&BcCuD<S:AnC  =  p)    =    AcD      . 

Statements  like  this  can  be  verified  by  easy  non- instantiating  techniques. 
Indeed,  the  following  steps  will  handle  this  example,  and  a-"n  others  like 
it: 


3 


(a)  We  proceed  by  contradiction.   For  the  formula  (1)  to  be  false, 
its  conclusion  must  be  false,  and  each  of  its  hypotheses  must  be  true. 
Thus  if  (1)  is  false  we  must  have 

(2)  A-B  =  p  ,   B-(CUD)  =  ^5  ,   ARC  =  p     ,  A-H^p      . 

(b)  Any  collection  of  boolean  equalities  like    (2),    consisting  of 
several  equations  but  just  one  inequality,    can  be  tested  for  consistency 
by  interpreting  the  variables  which  occur  in  it  as  proposistional  variables 
and  regarding  the  null-set  symbol     fi     as  a  synonym  for  the  propositional 
constant    'false'    (and   '  f  ft'    as   a  synonym  for   'true').      In  this   sense,    (2) 
can  be  rewritten  as 

(3)  A   =»    B     ,      B    =»     (CvD)      y      -i(A&C)      ,      A&-iD 

Application  of  any  propositional  consistency  algorithm  will  show  that 
the  set  of  propositions  (3)  is  inconsistent,  and  thus  (1)  must  be  true. 

This  simple  example  illustrates  the  flavor  of  non-instantiating 
automatic  proof  methods.  Although  only  restricted  classes  of  statements 
can  be  handled  by  such  methods,  some  of  these  classes  are  general  enough 
to  be  of  practical  significance.  Among  these  statement  classes  are  the 
following : 

(A)  Statements  only  involving  propositional  connectives,  integers  and 
linear  inequalities  among  them,  sets  and  boolean  operations  on  them, 
and  the  cardinality  operation  #A  .  An  example  of  such  a  statement 
would  be 

(ACB  &  BCCUD  &  #(AHC)  <  k)  =»  #A  <  (#D+k)   . 

(B)  (Behmann)   Statements  as  in  (A),  but  involving  quantifiers  over 
sets  as  well. 

(C)  (Presburger)   Statements  only  involving  integers,  addition,  subtraction, 
inequality  and  equality  comparisons,  and  quantification  over  integer 
variables. 

(D)  (Tarski)   Statements  only  involving  real  numbers,  addition,  subtraction, 
multiplication,  inequality  and  equality  comparisons,  and  quantification 
over  real  variables. 

(E)  Statements  involving  only  propositional  connectives,  the  equality 
relationship,  and  uninterpreted  function  symbols. 


Verification  of  entirely  general  statements  of  the  two  forms  (D)  and 
(D)  will  necessarily  be  hyperexponentially  inefficient;  on  the  other  hand, 
various  special  subforms  of  frequent  occurrence  (e.g.  linear  inequalities, 
polynomial  identities  and  implications  between  .^all  collections  of 
polynomial  equations)  can  be  handled  in  acceptably  efficient  fashion. 

Non-instantiating  proofs  in  the  theory  of  equality  with  non- interpreted 
function  symbols  ((E)  in  the  preceding  list)  can  be  handled  very  efficiently 
by  use  of  the  Hopcroft  -  Ullman  -  Tar j an  high  speed  equivalence  class 
processing  algorithms.   Of  course,  high  efficiency  algorithms  for  other 
theorem-proving  applications  can  be  expected  to  develop  also. 

A  useful  technique  which  makes  it  easy  to  combine  independent 
non-instantiating  theorem  provers  developed  for  separate  mathematical 
areas,  thereby  obtaining  a  composite  prover  able  to  handle  all  these  areas 
at  once,  has  recently  been  described  by  Nelson  and  Oppen.   The  separate 
subprovers  of  such  a  composite  prover  run  independently  and  in  parallel, 
each  handling  that  part  of  a  given  collection  of  statements  which  only 
involves  function  and  predicate  symbols  which  belong  to  the  subtheory  which 
the  particular  subprover  can  handle.  Whenever  a  new  relationship  of 
equality  is  deduced  by  one  of  the  subprovers,  it  is  communicated  to  all 
the  other  subprovers,  which  can  then  use  it  to  deduce  additional  facts 
and  eventually  additional  equalities.   Only  equalities  need  to  be  communicated 
between  subprovers.   This  narrow  communication  path  between  independent 
mechanisms  which  can  otherwise  act  in  complete  independence  is  ideally 
adapted  for  implementation  on  the  parallel  hardware  which  may  become 
available  in  the  next  few  years. 

It  appears  likely  that  the  range  of  statements  accessible  to 
non-instantiating  proof  methods  can  be  broadened  in  many  modest  but 
useful  ways,  and  efforts  along  this  line  ought  to  be  pushed. 

We  turn  now  to  a  discussion  of  the  more  general  class  of  instantiating 
provers.   These  can  be  separated  into  two  subclasses;  provers  of  the 
resolution  type,  which  handle  arbitrary  systems  of  propositions  written 
in  first  order  predicate  calculus,  and  more  specialised  systems,  which 
deal  somewhat  more  efficiently  with  important  special  classes  of 
propositions,  for  example  systems  of  algebraic  identities. 


10 


In  order  to  estinate  the  efficiency  and  likely  limitations  of  general 
predicate  calculus  provers  of  the  resolution  type,  we  can  begin  with  a  few 
preliminary  remarks  concerning  the  analysis  of  systems  of  statements  written 
in  the  much  simpler  propositional  calculus.   Most  of  the  statements  (in 
either  the  propositional  or  the  predicate  calculus)  that  a  prover  is  likely 
to  encounter  will  have  the  form 

(U)       (P1  &  P2  &  . . .  &  Pn)  -  Q  , 

i.e.,  will  involve  several  hypotheses  P.  which  together  imply  a  conclusion 
Q  .   Statements  of  this  form  are  called  Horn  Clauses.   Occasionally,  but 
not  often,  statements  of  the  more  general  form 

(5)       (P1  &   P2  &  ...  &  Pn)  -  (0^  v  0^  v  ...  v  Qj 

will  be  encountered.   Such  clauses  involve  a  disjunction  of  possible 
conclusions;  when  clauses  like  this  arise  in  ordinary  mathematical  practice 
proof  by  consideration  of  separate  subcases  is  likely  to  be  necessary. 
A  clause  (5)  is  said  to  have  m-1  excess  positive  literals  (namely 
Qo^-'^Qjj,  )•  Finally,  any  set  of  clauses  to  be  tested  for  consistency  will 
contain  a  few  positive  unit  clauses  of  the  simple  form  P  and  a  few 
negative  unit  clauses  of  the  form  -,  P  ('not  P '). 

If  a  set  of  propositions  constains  Horn  clauses  exclusively,  the 
propositional  consistency  of  the  set  can  be  tested  very  rapidly,  namely 
in  a  time  linearly  proportional  to  the  total  length  of  the  set  of  clauses. 
(One  can  proceed  simply  by  deducing 

(V)      (P2  &  ...  &  Pn)  -  Q 

from  (h)   for  each  positive  unit  clause  P,  which  has  previously  been 
deduced.   Eventually  this  process  will  either  run  out  of  new  positive  unit 
clauses  (in  which  case  the  collection  of  clauses  under  consideration  is 
consistent)  or  will  reach  a  direct  contradiction  between  a  positive  and  a 
negative  unit  clause. )  This  fast  procedure  can  be  generalized  to  sets  S 
of  clauses  in  which  non-Horn  clauses  (5)  appear,  but  then  is  no  longer 
quite  as  fast.  More  specifically,  for  such  S  ,  the  resulting  consistency 
test  procedure  is  exponential  in  the  total  number  E  of  excess  positive 
literals  which  occur  in  S  .   Since  non-Horn  clauses  (5)  are  ra^e,  E  will 
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generally  be  quite  small,  so  that  in  all  but  the  unluckiest  cases  we  can 
expect  to  test  a  set  of  clauses  for  propositional  consistency  rather 
rapidly. 

AH  this  tells  us  that  the  difficulty  of  testing  a  set  of  predicate 
formulae  for  logical  (predicate)  consistency,  which  is  a  matter  very  much 
more  difficult  than  the  corresponding  propositional  test,  lies  almost 
exclusively  in  the  difficulty  of  determining  what  values  should  be  assigned 
to  the  variables  which  occur  within  these  predicate  formulae.   That  is,  the 
central  difficulty  in  predicate  calculus  proof  lies  in  the  difficulty  of 
instantiating  variables  correctly.   We  shall  now  review  various  key  technical 
facts  relevant  to  this  problem. 

(a)  Any  system  of  predicate  calculus  formulae  can  be  transformed  easily 
and  efficiently  into  a  collection  of  clauses  each  of  which  is  of  the  form 

(6) 

Here,  each  P.  will  be  a  positive  or  negative  unit  clause  of  the  form 
J 


RC^,...,^)   or   -,R(e  ,...,e  ) 


r 

where  R  is  a  predicate  symbol  representing  some  arbitrary  m -argument 

boolean-valued  function,  and  each  e  .  is  a  formal  expression  built  from 

J 
variables  x,  y  ,  from  symbols  c,  d  ,  etc.,  representing  constant  objects, 

and  from  symbols  f  ,  g  ,  etc.  representing  arbitrary  multi- argument 

mappings  between  objects.   (A  typical  formula  of  this  kind  might  look 

something  like 

(8)      ^R(x,y)  v  R(f(x,y),g(y))   .) 

(b)  Herbrand's  Fundamental  Theorem  tells  us  that  a  collection  S  of 
predicate  formulae  (6)  is  logically  inconsistent  (according  to  the  rules  of 
predicate  calculus;  these  are  in  fact  the  only  rules  of  logic)  if  and  only 
if  one  can  make  substitutions  for  the  variables  occuring  in  the  formulae 
of  S  in  such  a  way  as  to  generate  a  purely  propositional  inconsistency. 
(For  example,  the  set  of  predicate  formulae 
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(9a)      R(f(x))  =»  R(f(g(x))) 

(9b)      R(f(g(c))) 

(9c)     -.R(f(g(g(g(c))))) 

where  x  is  a  variable  and  c  is  a  constant,  are  logically  contradictory. 
To  establish  this,  we  have  only  to  generate  two  instances  of  (9a),  first 
substituting  g(c)  for  x  and  then  substituting  g(g(c))  for  x  .  This 
gives 

(10a)  R(f(g(c)))  =*  R(f(g(g(c))))  (from  (9a),  with  x  =  g(c)) 

(10b)  R(f(g(g(c))))  *  R(f(g(g(g(c)))))  (again  from  (9a),  with  x  =  g(g(c)) 

(10c)  R(f(g(c)))  (from  (9b)) 

(lOd)  -iR(f(g(g(g(c)))))  (from  (9c)). 

If  we  now  abbreviate  R(f(g(c)))  by  A,   R(f (g(g(c) ) ) )  by  B  ,  and 
R(f(g(g(g(c)))))  by  C  ,  the  propositions  (10)  become 

A  =*  B  ,   3=sC,   A  ,  -iC  , 

and  we  clearly  have  a  propositional  contradiction.   It  is  astonishing,  but 
nevertheless  true  according  to  Herbrand's  Theorem,  that  the  whole  of  logic 
should  be  reducible  to  so  trivial  a  process  of  substitution  and  Boolean 
argument ! ) 

(c)  The  general  problem  of  logical  deduction  therefore  reduces  to  the 
problem  of  finding  the  right  substitutions  to  make  for  the  variables  in  a 
collection  of  predicate  formulae,  so  as  to  bring  out  a  propositional 
contradiction  if  any  such  contradiction  exists.  By  virtue  of  its  complete 
generality,  this  problem  is  of  course  inherently  difficult  (indeed, 
recursively  unsolvable  in  the  general  case).  Nevertheless,  some  clue  as 
to  how  to  approach  this  general  problem  of  instantiation  can  be  gleaned    • 
by  considering  the  way  in  which  a  propositional  contradiction  could  emerge 
after  all  variables  in  a  set  of  predicate  clauses  were  appropriately 
instantiated.   Specifically,  we  can  observe  that  in  a  miiixmaLl  propositional 
contradiction,  i.e.,  a  set  C  of  clauses  of  the  form  (6)  which  are 
propositionally  inconsistent  but  which  contain  no  propositionally  inconsistent 
subset,  there  can  exist  no  propositional  symbol  P.  whose  negative  never 
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appears  in  any  clause  of  C  .   (For  by  giving  such 
'true*,  we  make  all  the  clauses  in  which  P.   appears  true,  and  thus  if 
the  set  C  were  propositionally  inconsistent,  it  would  have  to  remain 
inconsistent  after  all  those  clauses  were  deleted,  and  could  therefore 
not  be  a  minimal  contradiction. )  This  observation  tells  us  that  to  develop 
a  (minimal)  propositions!  contradiction  by  substituting  for  the  variables 
in  a  collection  of  predicate  clauses,  we  must  always  ensure  that  every 
literal  P.  which  is  generated  is  the  precise  negative  of  at  least  one 
other  literal  generated  from  some  other  clause  of  C  . 

It  is  easy  to  see  the  extent  to  which  this  observation  can  be  used 
to  guide  the  process  of  determining  appropriate  substitutions  of  predicate 
values.   Namely,  a  substituted  instance  of  a  unit  clause  of  the  form 
R(e_,...,e  )  can  match  a  substituted  instance  of  some  other  unit  clause 
R' (e£,  ..  .,e^)   if  and  only  if  the  lead  symbols  R  and  R'   are  the  same 
and  the  internal  structures  of  each  expression  e  .  matches  the  structure 
of  the  corresponding  e'.   closely  enough  for  each  e.  to  be  'unifiable' 
with  its  corresponding   e'  by  substitutions  whose  existence  and  value 
can  be  determined  rapidly  using  a  recursive  matching  algorithm.   (For 
example,   R(x,g(u,x))  and  -,  R(f  (b),g(y,  z) )   are  unified  by  substituting 
f(b)   for  x  and  z  ,   y  for  u  ;  and  any  other  substitution  which 
unifies  these  two  unit  formulae  is  a  special  case  of  this  substitution.) 
Therefore  in  searching  for  a  pattern  of  substitutions  which  will  manifest 
a  predicate  contradiction  by  generating  a  prepositional  contradiction  from 
it,  we  can  confine  our  attention  to  substitutions  generated  by  unifying 
some  unit  predicate  formulae  with  some  other  unit  formulae  involving  the 
same  leading  symbol. 

(d)  This  fundamental  observation  of  J.  A.  Robinson,  which  cuts  down 
enormously  on  the  number  of  substitutions  that  need  to  be  considered  in 
searching  for  a  predicate  contradiction,  has  been  basic  to  a.n  work  on 
general  instantiating  theorem  provers  since  1967.  However,  even  the  search 
that  remains  is  far  too  large  for  there  to  be  much  hope  of  deducing  any  but 
superficial  theorems  without  bringing  some  other  equally  deep  formal  principle  to 
bear.   During  the  last  decade,  considerable  energy  has  gone  into  attempts 
to  refine  the  general  process  of  search  for  predicate  contradictions,  and 
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there  have  appeared  over  one  hundred  papers  proposing  various  ways  of 
arranging  and  pruning  this  search,  as  well  as  heuristics  for  guiding  it. 
Unfortunately  no  very  powerful  pruning  techniques  or  heuristics  have 
emerged  from  all  this  work,  aside  from  the  rather  trivial  idea  of 
concentrating  attention  on  the  shortest  formulae  which  at  any  given  moment 
offer  hope  of  leading  to  a  proof. 

The  following  technique  is  about  as  good  as  any  of  the  ideas  that  have 
been  suggested,  and  can  be  used  to  estimate  the  power  which  completely 
general  instantiating  provers  are  likely  to  attain.   Given  a  collection  S 
of  predicate  clauses  (6)  from  which  a  logical  contradiction  is  to  be  deduced, 
we  begin  by  choosing  some  clause  C  of  S  which  must  be  used  to  form  the 
contradiction.   (E.g.,  if  every  clause  of  S  other  than  C  contains  some 
unnegated  unit  formula,  then  C  must  appear  at  least  once  in  any 
contradiction  deduced  from  S  ,  and  we  can  choose  C  to  start  with. ) 
After  choosing  a  starting  clause  C  we  arrange  its  units   P.  in  order, 
putting  units  containing  more  constants  and  fewer  variables  first.   Then 
we  fix  attention  on  the  first  unit  P.   of  C  .  For  a  substituted  instance 
of  C  to  appear  in  a  minimal  contradiction,   P   must  precisely  match  the 
substituted  negative  of  a  unit  formula  P'   appearing  in  some  other 
clause  C   of  S  .   We  consider  all  such  P,'  ,  and  for  each  of  them  we 


substitution  is  then  applied  to  C  and  C  ,  after  which  we  go  on  to 

consider  the  still  unmatched  atomic  subparts  of  (the  appropriated  substituted 

variants  of)  C  and  C*  .  Matches  for  each  of  these  parts  are  worked  out 

in  turn;  this  generates  a  growing  set  C,C',C", ...  of  substituted 

variants  of  clauses  taken  from  S  ,  together  with  corresponding  substitutions. 

Whenever  we  encounter  a  set  C,C',C", ...   of  substituted  clauses  in  which 

every  unit  subformula  P.  is  matched  to  at  least  one  other  P*   as  its 

J  s- 

precise  negative,  we  test  the  collection  C,C',C",  ...   for  prepositional 

consistency.   According  to  Herb rand ' s  fundamental  theorem,  this  search 

process  will  eventually  uncover  a  propositional  inconsistency,  provided 

that  the  given  set  S  of  predicate  clauses  is  in  fact  logically 

inconsistent. 


(e)   Each  major  cycle  of  the  process  of  search  that  we  have  just 
described  takes  a  unit  formula  P  in  one  clause  and  matches  it  to  some 
unit  formula  P'   in  another  clause  by  working  out  a  substitution  that 
makes  P  and  P*   identical.   P  can  normally  be  matched  to  any  one  of 
several  possible  P'  ,  and  it  is  generally  not  clear  which  one  of  several 
possible  matches  will  lead  to  the  desired  propositional  contradiction. 
Each  substitution  tends  to  introduce  more  constants  into  the  set  of  clauses 
being  processed,  and  thus  partially  constrains  the  pattern  of  substitutions 
that  can  follow.   Examination  of  examples  suggests  that  the  number  of  P* 
to  which  a  given  P  can  be  matched  generally  hovers  around  3  to  L   To 
find  a  propositional  contradiction  involving  n  pairs  of  matched  unit 
formulae  should  therefore  require  roughly  3   search  steps.   For  n  =  10 
this  is  roughly  100,000,  and  thus  we  can  therefore  estimate  that  perfectly 
general  instantiating  provers  of  the  type  we  have  been  considering  should 
be  able  to  work  out  most  predicate  arguments  involving  not  more  than  20 
unit  formulae,  out  will  generally  not  be  able  to  handle  arguments  much 
longer  than  this.   Such  provers  can  therefore  be  expected  to  possess  a 
useful,  but  always  severely  limited,  capability  for  general  proof. 

Having  thus  summarised  the  situation  in  regard  to  instantiating  theorem 
provers  designed  to  work  with  perfectly  general  sets  of  predicate  formulae, 
we  shall  now  review  the  status  of  more  specialised  instantiating  provers 
keyed  to  recognise  important  logical  relations  and  functions  and  handle  them 
in  particularly  advantageous  ways.   Specialised  methods  of  this  sort  have 
been  devised  for  treating  the  equality  relationship,  transitive  relationships 
such  as  inequality  and  inclusion,  and  also  various  algebraic  operations  and 
associated  structural  relationships  such  as  associativity.   Of  these, 
equality  is  by  far  the  most  important  and  has  received  the  most  attention. 
For  this  reason  we  will  concentrate  the  next  part  of  our  review  on 
specialised  treatments  of  equality  to  the  neglect  of  other  significant 
special  situations. 

A  specialised  variant  of  the  resolution  technique,  which  carries  the 
strange  name  'paramodulation',  has  been  devised  for  dealing  with  clauses 
containing  a  general  mixture  of  unit  formulae  of  the  form  (6)  and  equality 
units  of  the  more  special  form 

(11)      ex  =  e2       (or  e±  ji   e2  )  . 

16 


The  paramodulation  technique  makes  use  of  special  rules  allowing  equality- 
units  to  trigger  substitutions  for  subexpressions  within  other  units  of  a 
set  of  clauses  being  processed.   This  approach  is  demonstrably  better  than 
a  completely  general  axiomatic  treatment  of  equality,  which  would  put  the 
relationship  x  =  y  on  the  same  footing  as  any  other  predicate  R(x, y) 
of  two  arguments;  but  it  is  still  poor.   Indeed,  for  sets  of  clauses 
consisting  exclusively  of  equality  units  (11)  paramodulation  reduces  to 
the  following  trivial  and  obviously  unsatisfactory  procedure:   substitute 
equals  for  equals  in  all  possible  ways,  until  a  logical  contradiction 
emerges . 

A  less  blind  and  much  better  treatment  of  systems  of  equalities  (11) 
was  developed  by  Knuth  and  Bendix  in  1968,  but  is  only  now  finding  its  way 
into  theorem  provers.   Given  a  set  of  identities  (11),  Knuth  and  Bendix 
arrange  them  so  that  the  left-hand  side  of  (11)  is  'longer'  in  some 
appropriate  formal  sense  than  its  right  hand  side.   This  allows  each 
relationship  (11)  to  be  regarded  as  a  'simplification',  and  in  favorable 
cases  establishes  an  unambiguous  'downhill'  direction  in  which  such  a 
relationship  can  be  applied  by  substituting  equals  for  equals.   This  prunes 
the  process  of  substitution  very  drastically,  and  allows  acceptably 
efficient  treatment  of  cases  in  which  the  more  haphazard  paramodulation 
approach  would  bog  down  completely.   Unfortunately,  as  it  stands,  the 
Knuth  -  Bendix  technique  applies  only  to  systems  of  identities  like  (11), 
and  not  to  all  such  systems.   However,  ways  have  recently  been  found  to 
extend  it  to  systems  of  implications  between  equalities  like 

el  =  e2  &  63  =  %    &    —    &    em  =  Vl  *  S'  =  6"   ' 

and  also  improve  the  efficiency  with  which  this  technique  applies  to  situations 
involving  associative  and  commutative  operations.   It  would  be  desirable  to 
extend  the  Knuth  -  Bendix  technique  to  systems  of  clauses  involving  equality 
units  intermixed  with  more  general  unit  formulae  (6),  in  this  way  obtaining 
a  more  efficient,  though  perhaps  somewhat  specialised,  alternative  to 
paramodulation. 

There  remains  one  last  significant  technique  to  discuss.  This  is  the 
systematic  use  of  definitions,  in  reverse,  to  eliminate  defined  predicates 
and  operators,  thereby  reducing  formulae  containing  these  predicates  and 
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operators  to  simpler  form.   This  technique  rests  upon  an  important  general 
metamathematical  principle,  the  so-called  principle  of  definition,  which 
we  can  state  as  follows ; 

A.  Let  S  be  any  collection  of  predicate  formulae,  let  e(x..,  ...,x  ) 
be  any  predicate  formula  whose  free  variables  are  x..,...,x  ,  and  let  F 
be  any  completely  new  predicate  symbol,  i.e.,  a  symbol  not  appearing  in 
any  formula  of  S  .   Then  without  affecting  the  consistency  of  S  we  can 
extend  it  by  adding  the  formula 

(12)  (Vx^  ...,xQ)(F(x1,...,xn)  «  e(x1,...,xn)) 

to  S  .   (Formula  (12)  is  then  called  the  definition  of  the  new  predicate 
symbol  F  . ) 

B.  Any  predicate  formula  which  can  be  proved  from  formula  (12)  and  the 
other  formulae  of  S  ,  and  which  does  not  contain  any  occurence  of  the 
defined  symbol  F  ,  can  be  proved  from  the  formulae  of  S  alone. 

A  similar  but  somewhat  more  complex  principle  of  definition  allows 
introduction  and  removal  of  new  function  rather  than  new  predics te  symbols. 

The  principle  of  definition  tells  us  that  a  predicate  formula  P  which 
follows  from  the  formulae  of  S  and  from  a  definition  (12)  can  always  be 
proved  as  follows:   First  use  (12)  to  eliminate  every  occurence  of  F 
from  P  (by  substituting  e(x1,...,x  )   for  F(x.,...,x  )   at  every  occurence 
of  F  in  P  ) ;  then  the  proof  can  be  completed  by  using  formulae  of  S 
only,  but  the  definition  (12)  need  never  be  used  again. 

The  significance  of  this  rule  lies  in  the  fact  that  it  gives  us  a  priori 
knowledge  of  the  number  of  times  that  (12)  must  be  used  to  prove  P  ,  and  of 
the  pattern  in  which  (12)  should  be  used.   It  is  clear  that  this  can  make 
at  least  part  of  the  process  of  proving  P  much  more  deterministic,  hence 
efficient,  than  such  proof  would  otherwise  be. 

The  definitional  technique  that  we  have  just  described  applies  with 
particular  advantage  to  proofs  in  set  theory,  since  set  theory  involves  only 
a  very  few  primitive  axioms.   Moreover,  only  two  primitive  predicates, 
namely  equality  and  set  membership,  enter  into  set  theory,  all  other  predicates 
and  operations  being  definable,  and  definable  rapidly,  in  terms  of  these  two 
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primitives.   Obviously,  this  gives  the  definitional  technique  particular 
scope.   This  fact  has  been  exploited  by  Carvalho  and  by  Bledsoe  and  his 
group  at  the  University  of  Texas,  who  have  built  set-theoretic  theorem 
provers  that  incorporate  definition-reversing  rewrite  rules  and  that  have 
been  able  to  prove  theorems  in  elementary  set  theory  and  topology  going 
considerably  beyond  what  could  be  done  by  any  pure  resolution  technique. 

The  main  shortcoming  of  these  otherwise  surprisingly  powerful  provers 
is  their  inability  to  deal  adequately  with  the  problem  of  set- theoretic 
instantiation.   That  this  should  be  so  follows  very  directly  from  the 
structure  of  the  axioms  of  set  theory.   These  axioms  fall  into  two  classes. 
A  very  few  axioms  state  basic  properties  of  sets.   (Of  these,  the  'axiom  of 
extensionality*,  which  simply  states  that  two  sets  A  and  B  are  equal  if 
and  only  if  every  element  of  A  is  also  an  element  of  3  and  vice-versa, 
is  the  most  important.)   A  somewhat  larger  number  of  axioms  then  assert  the 
existence  of  all  the  fundamental  set  theoretic  constructs,  i.e.,  intersection, 
union,  pair  set,  power  set,  etc.   From  the  human  mathematician's  point  of  view, 
these  'existential  axioms'  are  ideally  powerful,  since  collectively  they 
imply  the  existence  of  every  set  definable  by  a  set  former 

{e(x)|c(x)}   . 

On  the  other  hand,  from  the  point  of  view  of  a  set-theoretic  theorem  proving 
program  needing  to  decide  how  a  particular  variable  is  to  be  instantiated, 
this  complete  flexibility  of  the  existential  axioms  of  set  theory  creates 
severe  problems,  since  these  axioms  allow  extremely  general  instantiations  and 
thus  guide  the  choice  of  an  instantiation  not  at  all.   In  consequence  of  this 
fact,  it  is  hard  for  a  proof  program  to  cope  with  theorems  requiring  even 
trivial  instantiations.   For  example,  even  so  trivial  an  assertion  as 

(3C)  (Vx)  (x  <  A  V  x  <  B  =*  xeC) 

which  is  trivially  proved  once  we  supply  the  apparently  obvious  instantiation 

C  =  powers et (A)  U  powers et(B) 

manually,  is  not  easy  for  a  fully  automatic  prover  to  handle.   Since  we  expect 
the  method  of  definition  elimination  outlined  above  to  apply  rather  successfully 
to  set-theoretic  proofs,  the  problem  of  set-theoretic  instantiation  that  we 
have  just  described  emerges  as  a  significant  issue  for  mechanical  theorem 
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proving  generally.   Seme  techniques  which  apply  to  this  problem  can  be 
gleaned  from  classical  work  of  Behmann  and  from  quite  recent  papers  of 
Bledsoe  and  his  students.   Even  if  this  problem  proves  to  be  intractable, 
we  can  still  hope  to  build  verification  systems  which,  although  they  require 
manual  input  of  »n  but  the  most  trivial  set  instantiations,  are  able  to 
handle  most  other  steps  in  a  proof  in  fairly  automatic  fashion.   Solution 
of  important  special  cases  of  the  set-theoretic  instantiation  problem  would 
give  us  significantly  more  comfortable  proof  verifiers,  able  perhaps  to 
accept  proofs  no  more  than  2  or  3  times  as  long  as  a  very  careful 
proof  written  for  a  human  reader. 

Correctness  proofs  for  numerical/scientific  programs  will  need  to  make 
use  of  theorems  taken  from  real  and  complex  analysis.  Hence  proof  verification 
techniques  which  can  handle  theorems  in  analysis  are  of  interest.   An 
interesting  technique  here  is  to  make  use  of  the  methods  of  non-standard 
analysis,  which,  by  imbedding  the  ordinary  objects  of  analysis  within  a 
systematically  constructed  family  of  completions,  can  often  manage  to 
remove  one  layer  of  quantifiers  from  the  statement  and  proof  of  analytic 
facts.  This  technique  has  been  discussed  by  Ballantyne  and  Bledsoe. 

In  the  preceding  discussion  of  automatic  non- instantiating  and 
instantiating  theorem  provers,  we  have  repeatedly  noted  situations  in  which 
efficient  methods  for  handling  particular  special  statement  classes  are 
already  known,  and  in  which  like  methods  for  important  related  cases  can 
probably  be  devised.   It  is  therefore  important  for  the  designer  of  a 
proof  verifier  to  try  to  provide  facilities  allowing  efficient  incorporation 
of  procedures  for  treatment  of  special  statement  classes,  as  such  procedures 
are  developed.   If  adequate  facilities  of  this  kind  can  be  provided,  smooth 
verifier  growth  avoiding  repeated  redesign  and  re-implementation  may  become 
possible. 

We  can  summarise  our  lengthy  review  of  the  present  status  of  automatic 
theorem  grovers  as  follows.   The  prospective  power  of  completely  general 
instantiating  theorem  provers  seems  limited,  but  should  be  exploited  as  far 
as  possible.   More  powerful  instantiating  techniques  for  dealing  with 
important  special  relationships  such  as  equality  also  exist.   These  need 
to  be  improved,  extended  to  other  significant  relationships  such  as 
inequality  and  set  inclusion,  and  more  adequately  integrated  with  general 
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resolution  techniques.   Especially  in  the  important  set-theoretic  context, 
the  definitional  method  appears  to  be  very  significant,  and  should  be 
explored  more  fully.   More  can  probably  be  done  by  way  of  automatic 
instantiation  of  set-valued  variables,  and  progress  along  this  line  might 
improve  the  performance  of  set-theoretic  theorem  provers  significantly. 
Non-instantiating  provers  and  provers  oriented  toward  special  subtheories 
can  handle  important  classes  of  statements  efficiently,  but  need  to  be 
more  adequately  integrated  as  auxiliary  subprocedures  within  general 
instantiating  frameworks.   It  should  also  be  possible  to  develop  efficient 
non- instantiating  provers  for  significant  classes  of  propositions  that  have 
not  yet  been  considered.   Overall  system  structures  allowing  smooth 
extension  of  the  collection  of  proof  procedures  constituting  a  verification 
system1 s  core  need  to  be  designed. 

Improvement  of  proof  mechanisms  along  roughly  the  lines  indicated,  and 
adequate  integration  of  these  mechanisms  into  a  total  verifier  system,  may  make  i 
possible  to  build  a  prover  which  is  able  to  take  reasonably  large  proof  steps,  as 
long  as  these  steps  involve  no  non-trivial  set-theoretic  instantiations 
and  only  a  very  little  general  predicate  manipulation.   The  user  of  such 
a  verifier  might  find  it  reasonably  comfortable,  once  he  accepted  the 
necessity  of  supplying  all  non-trivial  instantiations  manually.   After  an 
initial  period  of  design  and  discussion,  a  serious  experimental  attempt 
to  build  such  a  system  may  be  in  order. 


5.   Verification  of  parallel  programs. 

Parallel  programming  is  an  area  in  which  formal  verification  methods 
are  particularly  desirable.   Whereas  the  author  of  a  serial  program  P  will 
normally  be  able  to  provide  fairly  accurate  statements  of  the  most  central 
assertions  needed  to  prove  P ' s  correctness,  a  programmer  trying  to 
orchestrate  a  family  of  parallel  processes  will  soon  become  very  badly 
confused  by  the  great  multiplicity  of  sequencing  possibilities  inherent  in 
parallel  process  execution.   Unfortunately,  however,  parallel  program 
verification  technology  is  substantially  less  developed  than  serial  process 
verification  technique.   Far  from  being  able  to  prove  operating  system 
correctness  at  the  present  time,  we  are  barely  able  to  state  what  it  means 
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for  such  a  program  to  be  correct.   However,  the  development  of  structured 
concurrent-process  languages  (e.g.   concurrent  PASCAL,  MODULA,  and  GYVE) 
is  beginning  to  dispel  some  of  the  confusion  that  has  surrounded  this  area, 
and  various  recent  foundational  studies  (notably  by  Ashcroft  and  Cwicki) 
have  clarified  some  of  the  issues  connected  with  parallel  process 
verification.   Among  other  things,  these  studies  define  variations  of 
the  Floyd  approach  suitable  for  use  in  connection  with  parallel  programs. 
However,  for  the  correctness  of  large  parallel  systems  to  be  proved, 
appropriate  transformational  formalisms,  as  well  as  more  sophisticated 
techniques  for  organising  inductive  assertions  in  the  parallel  case,  will 
have  to  be  developed. 

A  complete  verification  formalism  developed  for  application  to  parallel 
programs  will  have  to  include  techniques  for  proof  of  deadlock  avoidance, 
fairness  of  service,  nontermination,  and  other  related  properties  which 
have  no  analogs  in  single  process  programming.   Methods  allowing  treatment 
of  issues  of  this  sort  are  only  now  starting  to  appear,  and  we  still  have 
little  sense  of  how  difficult  these  questions  really  are. 


6.   What  has  been  accomplished  to  date? 

Although  in  assessing  the  present  status  of  formal  verification 
technology  we  can  hardly  avoid  the  conclusion  that  these  techniques  are 
still  in  their  infancy,  the  last  decade  of  work  has  strengthened  program- 
proving  technique  in  significant  respects.   Progress  to  date  can  be  grouped 
under  three  headings. 

a.  Foundational.   Foundational  studies  have  extended  the  reach  of 
formal  correctness-proving  technique  to  apply  broadly  to  the  semantic 
features  used  in  actual  programming  languages.   For  example,  correctness 
rules  that  can  be  used  in  connection  with  programs  making  use  of  recursive 
procedures,  pointers,  coroutines,  and  parallelism  have  been  defined. 
Studies  in  this  area  have  also  established  important  logical  properties, 
e.g.   logical  completeness,  of  systems  of  program-proof  rules. 

b.  Pragmatic.   As  already  noted,  the  central  aim  of  this  area  of 

work  is  to  reduce  the  cost  of  formal  program  verification.   The  most  promising 
direction  of  work  in  this  area  appears  to  be  the  development  of  formalisms  and 
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systems  which  allow  correct  programs  to  be  transformed  and  combined  in 
ways  guaranteed  to  preserve  correctness.   This  area  of  work  also  benefits 
directly  from  ongoing  research  in  definition  of  very  high  level  languages 
(which  simplify  the  problem  of  correctness  proof  by  automating  the  final 
steps  of  program  development),  global  program  analysis  and  optimisation 
(which  provides  techniques  which  allow  considerable  amounts  of  relatively 
simple  information  usable  by  a  verification  system  to  be  developed 
automatically),  and  high  level  source- to- source  program  transformation 
(which  has  begun  to  make  us  aware  that  many  programs  can  be  regarded  as 
optimised  versions  of  very  straightforward  algorithms,  or  even  mathematical 
relationships.   The  work  of  Earley  and  of  Paige  on  set-theoretic  strength 
reduction,  and  of  Darlington,  Bursta.11,  Strong,  and  Walker  on  reduction  of 
recursive  algorithms  to  more  efficient  iterative  forms  is  of  particular 
interest  here . ) 

c.  Verification  examples.   By  now  quite  a  number  of  programs  have 
been  verified  semi -formally,  and  a  few  have  even  been  verified  in  a  fully 
formal  manner.   The  examples  that  have  been  treated  serve  to  measure  the 
power  of  existing  techniques  and  to  highlight  commonly  occuring  situations 
for  which  special  mechanisms  are  appropriate.   In  a  semi-formal  verification, 
one  supplies  an  algorithm  along  with  a  full  set  of  inductive  assertions,  but  thei 
gives  only  an  informal  proof,  in  the  style  of  ordinary  published  mathematics, 
of  the  implications  between  these  assertions  that  would  enter  into  a  full 
formal  verification.  Among  the  algorithms  for  which  rather  detailed 
'verification  scenarios'  of  this  kind  have  been  given,  we  may  note  various 
relatively  elementary  algorithms  such  as  approximate  real  quotient,  prime 
factorisation,  calculation  of  primes  by  the  seive  method,  as  well  as  a  few 
considerably  more  composite  programs  such  as  LISP  subset  compilers,  a 
verification  condition  generator,  a  (concurrent)  operating  system  fragment, 
and  a  fragment  of  a  data  communication  system.   One  current  set-theoretically 
oriented  study  is  attempting  to  attack  various  fairly  complex  combinatorial 
algorithms,  e.g.   certain  high -efficiency  graph- theoretic  algorithms  due  to 
Tarjan. 

It  is  worth  noting  that,  if  taken  up  in  a  semi-formal  sense,  the  problem 
of  compiler  verification  is  a  particularly  easy  one  to  treat,  since  it  is  a 
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tractable  instance  of  the  special  problem  of  proving  program  equivalence, 
and  often  one  in  which  equivalence  can  be  proved  by  relating  source  and 
target  code,  section  by  local  section.   For  this  reason,  compiler  correctness 
has  been  considered  particularly  often  in  the  literature. 

d.   Verification  systems  and  formal  verifications.   Fairly  complete 
program  verification  systems  have  been  built  or  are  in  process  of  development 
at  a  number  of  places,  including  Stanford  University  (Luckham),  ISI  (London, 
Gerhart,  and  collaborators),  University  of  Texas  (Good),  SRI,  and  the  IBM 
Yorktown  research  laboratory.   The  most  ambitious  of  these  systems  aim  to 
integrate  state  of  the  art  proof  verifiers  within  top-level  mechanisms 
oriented  toward  a  particular  programming  language.   Most  of  the  programs 
that  have  actually  been  verified  in  such  systems  are  either  quite  small, 
e.g.   integer  quotient  or  greatest  common  divisor  algorithms^  or  intermediate 
in  size,  e.g.   simple  sorting  algorithms,  list  reversal  and  other  basic 
LISP  functions,  etc.,  though  a  few  mere  substantial  code  fragments  have 
(by  great  effort)  also  been  verified.   Currently  attempts  are  under  way 
to  attack  more  substantial  cases,  e.g.   a  (purely  serial)  security  kernel 
taken  from  the  PDP-11  UITZX  operating  system,  and  a  file  updating  procedure 
several  hundred  statements  long;  how  adequately  one  can  handle  cases  of 
this  complexity  using  current  techniques  remains  to  be  seen. 

Most  current  program  verification  systems  focus  on  languages  of  roughly 
the  semantic  level  of  PASCAL.   In  the  opinion  of  the  present  author,  this 
is  unfortunate,  since  it  is  felt  that  attempting  to  operate  at  so  low  a 
semantic  level  is  bound  to  force  consideration  of  numerous  programming 
details  that  could  otherwise  be  bypassed,  and  that  will  have  to  be  bypassed 
if  formal  program  verification  is  to  become  a  broadly  useful  technique. 
Of  course,  the  popularity  of  PASCAL  reflects  the  fact  that  no  comparably 
structured  programming  language  of  substantially  higher  semantic  level  has 
yet  come  into  wide  use.  None  of  the  current  verification  systems  make  any 
formal  use  of  program  transformation  methods,  and  this  is  also  unfortunate 
since  transformational  techniques  can  probably  simplify  the  task  of  formal 
verification  very  considerably. 
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7-   Various  special  program  verification  contexts. 

Cut-down  versions  of  general  verification  techniques  can  often  serve 
as  useful  extensions  of  the  collection  of  informal  tools  used  in  ordinary 
debugging.   Microprogram  certification  is  an.  area  in  which  such  an  approach 
is  particularly  valuable.  Microprograms  normally  involve  large  numbers  of 
small  actions  which  collectively  accomplish  rather  simple  actions,  but 
which  are  nevertheless  hard  to  follow  manually.  As  shown  in  the  work  of 
Leeman,  Carter,  and  their  collaborators  at  IBM  Research,  formal  program 
verification  techniques  can  be  used  to  establish  section-by-section 
correspondences  between  microcode  and  higher  level  code,  thus  considerably 
alleviating  the  problem  of  microcode  certification.   This  same  technique, 
demonstrating  a  formal  sectional  correspondence  between  lower  level  and 
higher  level  variants  of  a  code,  has  been  used  to  show  the  correctness  of 
manually  optimised  assembly  language  variants  of  LISP  functions.  A  similar 
technique  can  be  applied  to  the  problem  of  verifying  the  correctness  of 
boolean  designs  for  large  scale  integrated  circuits,  exhaustive  case-by-case 
testing  of  which  is  just  as  impossible  as  completely  exhaustive  case-by-case 
program  testing. 


8.   T.-i-m-i  tat  ions  of  formal  verification  technology. 

In  spite  of  all  the  pragmatic  difficulties  noted  in  the  preceding 
pages,  full  formal  proof  of  the  correctness  of  programs  which  have  an 
essentially  combinatorial  character  is  possible  in  principle  and  will  in 
due  course  become  practical.   This  will  allow  verification  of  many  important 
algorithms  and  of  important  composite  programs  including  compilers,  major 
data  processing  applications,  operating  systems,  etc.   However,  the  technique 
of  formal  proof  cannot  be  expected  to  apply  in  completely  decisive  fashion 
to  all  important  areas  of  current  computer  usage.   Scientific  computation 
is  one  important  area  to  which  this  remark  applies.  A  program  which 
performs  a  scientific  calculation  can  be  no  more  reliable  than  the  physical 
equations  which  it  uses,  and  of  course  those  are  always  merely  approximations 
to  some  underlying  physical  truth.   In  such  situations,  all  one  can  expect 
to  prove  is  that  a  program  really  does  solve  some  assumed  system  of 
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mathematical  equations  set  up  to  describe  some  formally  specified 
physical  configuration  (which  may  nevertheless  incorrectly  represent 
the  real  physical  situation  which  one  is  trying  to  model;  difficulties 
of  this  sort  lie  entirely  out  of  the  reach  of  any  technique  of 
mathematical  proof).  Another  fundamental  difficulty  in  the  scientific 
computation  area  is  that  computations  of  this  sort  always  make  use  of 
complex  numerical  approximations  which  numerical  analysts  "believe  on 
good  ground  to  be  adequate  but  whose  precision  they  can  often  not  prove 
formally  or  even  estimate  precisely.  Because  of  these  compounded 
difficulties,  programs  which  perform  physical/scientific  and  or  approximate 
numerical  computations  will  resist  attempts  at  formal  proof  even  after 
formal  proof  of  programs  of  more  purely  combinatorial  character  has  become 
a  well  established  technique.  Compounding  (and  perhaps  reflecting)  this 
difficulty,  there  is  at  present  very  little  work  on  the  correctness  of 
numerical  programs. 


9.   Future  directions  and  land~?.rks. 

Several  government  agencies,  including  DQD,  NASA,  and  DOE,  use 
computers  to  control  equipment  whose  malfunction  or  failure  can  have 
catastrophic  consequences,  and  hence  have  an  inescapable  interest  in  the 
improvement  of  techniques  for  rigorous  mathematical  proof  of  program 
correctness.  Yet,  as  we  have  noted,  in  spite  of  over  two  decades  of  work 
by  many  investigators  these  techniques  are  still  highly  experimental. 
For  a  rigorous  verification  technology  to  become  practical,  many  conceptual 
and  pragmatic  improvements  to  current  approaches  will  have  to  be  invented. 
Program  proving  methodology  must  therefore  be  regarded  as  an  area  for 
sustained  nurture,  but  on  an  increasing  scale  as  justified  by  technical 
progress  and  perceived  opportunities,  over  the  next  decade. 

Such  a  great  variety  of  technical  problems  affect  this  area  that  to 
support  it  adequately  and  in  the  most  helpful  way  will  not  be  easy.   In 
order  to  focus  on  the  work  that  needs  to  be  done  and  the  support  problems 
likely  to  be  encountered,  it  is  useful  to  classify  the  needed  work  into 
two  categories:  theoretical  work  requiring  only  modest  levels  of  support 
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and  experimental  work  requiring  much  more  substantial  support.   In  the 
theoretical  area  we  will  note  a  number  of  significant  directions  of  work 
that  ought  to  be  supported.   For  the  experimental  area,  we  will  note 
\arious  landmarks  which  will  need  to  be  passed  as  a  practical  program 
verification  technology  is  developed. 

The  following  theoretical  efforts  seem  promising  enough  to  deserve 
special  mention: 

a.  Correctness  and  accuracy  of  large  numerical  codes.   Although 
quite  a  bit  of  experimental  work  on  quality  of  individual  numerical  software 
modules  is  reported,  there  seems  to  be  very  little  work  on  adaptation  of 
correctness  proof  techniques  to  allow  rigorous  estimation  of  accuracy  for 
complex  numerical  codes.   Since  we  touch  here  upon  difficult  questions  in 
numerical  analysis  this  cannot  be  an  easy  area  in  which  to  work,  but  it 
plainly  deserves  more  attention  than  it  is  currently  getting. 

b.  Correctness  issues  in  concurrent  programming.   As  noted, 
concurrent  programming  is  a  particularly  important  area,  but  one  in  which 
fundamental  techniques  are  only  now  being  developed.   Theoretical  work  to 
perfect  these  techniques  is  still  necessary.  Among  the  many  questions 
for  which  better  answers  are  still  needed,  we  note  the  following:  What 
are  practical  techniques  for  proving  deadlock  avoidance  and  fairness  of 
service?  How  can  real-time  programs  be  proved  not  to  miss  critical 
deadlines?  Can  a  transformational  theory  of  concurrent  programs  be 
developed?  Can  the  concurrent  aspects  of  a  program  be  separated  from  its 
sequential  aspects  cleanly  enough  to  make  relatively  separate  treatments 
of  correct  action  and  correct  synchronisation  possible?  Can  an  abstract 
approach  to  the  synchronisation  structure  of  concurrent  programs  be  developed, 
allowing  properties  of  synchronisation  structures  to  be  established  even 
before  the  serial  portion  of  a  code  has  been  specified?  Generally  speaking, 
what  formalisms  might  facilitate  correctness  proofs  for  concurrent  programs? 

c.  Collection  of  program  transformations  likely  to  aid  in  simplifying 
correctness  proofs.   The  notion  that  programs  should  be  developed  by 
semiautomatic  application  of  optimising  techniques  to  very  succinct  high 
level  specifications  is  gaining  currency.  The  work  of  Bauer  and  his 
collaborators  at  the  Technical  University  of  Munich  reflects  this  idea  in 
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an  intersting  way.   Unfortunately  no  current  verifier  systems  have  yet 
made  use  of  this  idea.  Much  needs  to  be  done  in  this  area,  including 
collection  and  codification  of  program  transformations  which  account  for 
commonly  occuring  program  and  data  structures,  taxonomic  study  of  typical 
examples,  and  definition  of  effective  transformational  formalisms  for 
correctness  proof. 

d.  Special  techniques  for  compiler  verification.   Verified  compilers, 
which  can  be  shown  to  transform  source  code  into  logically  equivalent  target 
code,  can  obviously  be  used  as  tools  to  simplify  the  process  of  program 
verification.   Special  formalisms  allowing  all  the  phases  of  such  compilers 
to  be  verified  are  therefore  desirable.   Formalisms  useful  in  connection 
with  the  attribute  analysis,  optimisation,  and  high-quality  code  generation 
phases  of  a  compiler  can  probably  be  developed. 

e.  Extensibility  of  verification  systems.   We  now  know  various 
general  metamathematical  principles  which  define  extensible  verifiers 
capable  of  incorporating  new  verification  formalisms  as  such  formalisms 
are  developed.   On  the  other  hand,  we  do  not  have  adequate  techniques 
for  proving  the  metatheorems  needed  to  justify  such  extended  formalisms. 
Analysis  of  desirable  extensions  and  attempts  to  simplify  proof  of  their 
justifying  metatheorems  are  therefore  desirable. 

Experimental  correctness-proof  systems. 

To  develop  a  practical  technology  of  program  proof  without  building 
a  number  of  experimental  verification  systems  is  not  really  possible.   On 
the  other  hand,  such  developments  are  bound  to  be  rather  expensive.   For 
this  reason,  one  wants  to  favor  experimental  developments  which 

(i)  Begin  with  a  well- thought  out  and  reasonably  detailed  design 
which  is  arguably  superior,  in  some  significant  way,  to  earlier  experimental 
systems  and  competing  designs. 

(ii)  Address  goals  which  are  well  defined  enough  and  familiar  enough 
for  a  clear  comparison  with  other  similar  experiments  to  be  possible. 
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(iii)   Leave  behind  a  well- documented  collection  of  modules  and  a 
clearly  defined  and  documented  system  approach,  ultimately  with  an  account 
of  the  advantages  and  shortcomings  of  each  such  module  and  approach,  as 
experienced  during  an  adequate  period  of  experimental  use. 

(iv)  Use  existing  systems  components  where  possible,  and/or  promise 
to  produce  systems  components  likely  to  be  useful  in  subsequent  experiments. 

Unfortunately,  any  significant  verification  system  is  apt  to  require 
quite  a  list  of  substantial  software  components,  many  of  which  will  have 
to  be  built  or  at  any  rate  specially  modified  for  use  within  the  system. 
These  include 

a.  A  fairly  full  compiler  for  the  programming  language  which  the 
verifier  will  handle.   Such  a  compiler  can  omit  the  code  generator  which 
most  compilers  need,  but  will  have  to  include  a  syntax  and  variable- 
attribute  analysers,  and  probably  also  a  package  of  global  program 
analysis  routines. 

b.  Program  transformation  routines  and  verification  condition 
generators. 

c.  A  proof  verification  package,  and  possibly  also  algebraic  and 
logical  simplification  procedures. 

d.  Program  library  maintainance  aids  and  editors  which  keep  track 
of  the  degree  to  which  programs  have  been  verified  and  of  the  effect  on 
this  'verification  status'  of  each  program  change. 

e.  Various  utilities,  including  specialised  input  and  display  routines, 
embedded  within  an  adequately  interactive  system. 

Any  system  for  which  even  a  few  of  these  components  need  to  be  specially 
built  or  extensively  modified  will  obviously  be  expensive.   Clearly  then,  it 
will  only  be  possible  to  support  a  very  few  experimental  systems. 

In  assessing  the  success  of  a  series  of  experimental  systems  efforts, 
progress  can  be  measured  in  terms  of  growing  ability  to  deal  with  major 
language  facilities  such  as  recursion  and  parallelism,  by  ability  to 
handle  large  systems  of  programs,  by  ability  to  re-use  verified  fragments, 
and  in  terms  of  what  might  be  called  a  system's  verification  ratio,  namely 
the  total  number  of  lines  of  additional  material  which  need  to  be  entered 
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to  verify  a  program  P  in  all  detail,  divided  by  the  length  of  P  .   In 
these  respects,  the  following  landmark  targets  can  serve  to  measure 
progress: 

Landmark  1:   Formal  verification  of  substantial  combinatorial 
algorithms,  e.g.   Ford  -  Johnson  tournament  sort,  maximum  flow  in  a  graph, 
LR  parsing,  fast  polynomial  multiplication. 

Landmark  2:   Redo  of  Landmark  1  with  substantially  reduced  verification 
ratios,  e.g.  verification  ratios  less  than  5  . 

Landmark  3 :   Full  formal  verification  of  a  complete  compiler, 
including  parser,  attribute  analyser,  and  code  generator. 

Landmark  h:        Redo  of  Landmark  3  for  several  related  compiler  variants 
which  use  various  efficient  data  structures.   This  redo  should  demonstrate 
techniques  which  avoid  large  verification  ratios,  possibly  by  starting  from 
a  high  level  compiler  variant  and  improving  it  in  various  ways. 

Landmark  5;   Formal  verification  of  a  substantial  application  program 
(e.g.  a  statistical  analysis  package)  involving  multiple  subprocedures. 
This  effort  should  involve  substantial  re-use  of  verified  code  fragments 
and  demonstrate  that  verification  ratio  need  not  rise  with  increasing 
program  length. 

Landmark  6:   Verification  of  a  substantial  operating  system  fragment, 
e.g.   the  dispatching  and  scheduling  mechanisms  of  a  .small  timesharing 
system.   This  should  demonstrate  techniques  which  avoid  verification 
ratios  much  larger  than  5  • 

Landmark  7:   Verification  of  a  substantial  fragment  of  a  real-time 
system,  with  proof  that  key  real-time  deadlines  will  be  met. 

Landmark  8:   Complete  verification  of  a  small  timesharing  system, 
with  formal  proof  of  absence  of  deadlock  and  fairness  of  service. 

Landmark  9:   Formal  verification  of  a  substantial  numerical  package, 
e.g.   one  which  uses  the  fast  Fourier  transform  as  a  subprocedure,  with 
adequate  treatment  of  approximation  issues. 
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10.   The  Lipton  -  Perlis  -  de  Mi  Ho  controversy. 

In  a  paper  presented  at  the  1S77  POPL  conference,  Lipton,  Perlis,  and 
de  Millo  present  arguments  which,  if  sustained,  would  entirely  undercut  the 
technical  perspectives  put  forward  in  the  present  survey,  since  they  argue 
that  fully  formal  verification  of  significant  programs  is  barely  possible 
and  can  at  any  rate  never  be  useful.   But  upon  close  examination,  the 
arguments  they  give  in  support  of  this  point  of  view  (which  has  subsequently 
become  very  well  known)  do  not  hold  water.   In  this  section  we  will 
summarise  their  principal  arguments,  along  with  what  seem  to  be  sufficient 
rejoinders. 

a.  They  argue  that  acceptance  of  ordinary  mathematical  proofs  is  an 
informal  social  process  sustained  by  the  interest  which  mathematicians  feel 
in  each  other's  work;  and  that  only  theorems  which  have  attractively  simple 
statements  can  attract  this  sort  of  interest.   Rejoinder;   True.   This  is 
precisely  why  proofs  of  program  correctness  must  be  full,  formal,  and 
computer  checked  rather  than  informal  and  merely  manually  checked. 

b.  They  argue  that  formal  proofs  of  the  sort  of  complex  theorems 
arising  in  connection  with  large  programs  are  impossible  in  view  of  various 
known  metatheorems  which  imply  that  certain  short  mathematical  propositions 
can  only  have  proofs  which  are  immensely  long.   Rejoinder;   This  is  a 
misapprehension.   In  an  adequate  verification  system,  a  program's  formal 
correctness  proof  will  ordinarily  have  much  the  same  overall  structure  as 
an  informal,  but  careful,  argument  which  the  author  of  the  program  might 
advance  in  trying  to  convince  another  programmer  (or  convince  himself)  that 
the  program  really  works.   Indeed,  if  he  is  doing  more  than  setting  down 
instructions  at  random,  a  program's  author  must  have  at  least  a  crude 
outline  of  such  a  proof  in  mind  as  he  composes  the  program.  Proofs  of  this 
kind  will  never  be  enormously  long.  Here  Lipton  et  al  seem  to  have  been 
misled  by  the  admitted  clumsiness  of  present  verification  systems  into 
imagining  that  informal  proofs  are  enormously  shorter  than  formal  proofs. 
But  this  is  wrong;   the  length  of  an  informal  proof  and  of  its  formal i  sation 
are  related  by  a  constant  factor. 

c.  They  argue  that  programs,  like  other  objects  of  engineering,  do  not 
require  formal  proof,  and  that  all  that  is  required  (or  possible)  is  that 
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programs  should  have  a  structure  which  is  intuitive  and  appealing  enough 
to  be  'believable1  .   Re  j oinder :  Programs  are  vastly  more  complex  and 
multifarious  than  other  engineering  constructs,  and  can  fail  totally 
if  a  single  critical  detail  is  wrong.   Given  this,  it  is  clear  that 
formalised  rules  of  correct  program  construction,  and  techniques  which 
allow  programs  to  be  checked  formally  for  adherence  to  these  rules,  are 
necessary.   To  say  that  programs  should  possess  a  'believable'  intuitive 
structure  is  no  help,  since  what  we  require  is  precisely  a  formal  tool 
which  will  allow  us  to  relate  the  detailed  text  of  a  program  to  the 
skeleton  of  intuitive  concepts  which  the  program's  author  intended  it  to 
embody.   Within  a  verification  system,  it  is  just  this  intuitive  skeleton 
that  reappears  on  the  set  of  'inductive  assertions'  that  need  to  be 
attached  to  the  program  text. 

The  Lipton  -  Perils  -  de  Millo  arguments  do  serve  to  underscore  the 
difficulty  of  the  problems  which  verification  technology  faces,  and  to 
remind  us  that  this  technology  is  still  a  goal  rather  than  an  established 
reality.   Overall,  however,  these  arguments  must  be  rejected  as  contributing 
more  smoke  than  light  in  a  manner  quite  uncharacteristic  for  these  eminent 
authors. 


UL.   A  summary  comment. 

We  see  the  developing  technology  of  formal  program  proof  as  an  area 
of  inescapable  importance,  but  also  as  an  area  in  which  progress  will  be 
slow  and  incremental  because  of  the  number  of  complex  subtechnologies 
needed  to  support  really  effective  verification  systems.   Among  these 
subtechnologies,  each  of  which  is  a  major  research  field  in  its  own  right, 
are: 

(i)  More  powerful  languages  for  succinct  and  highly  disciplined 
description  of  serial  and  parallel  programs. 

(ii)   Formal  program  transformation  systems  which  make  it  possible  to 
derive  programs  in  systematic  fashion  from  their  most  succinct  'root'  forms, 
and  which  facilitate  combination  of  verified  program  fragments  into  large 
programs . 

(iii)  Improved  proof  verification  techniques. 
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